AITopics | intelligibility prediction

Collaborating Authors

intelligibility prediction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners

Cao, Boxuan, Li, Linkai, Yu, Hanlin, Mo, Changgeng, Zhou, Haoshuai, Wang, Shan Xiang

arXiv.org Artificial IntelligenceSep-23-2025

Speech intelligibility evaluation for hearing-impaired (HI) listeners is essential for assessing hearing aid performance, traditionally relying on listening tests or intrusive methods like HASPI. However, these methods require clean reference signals, which are often unavailable in real-world conditions, creating a gap between lab-based and real-world assessments. To address this, we propose a non-intrusive intelligibility prediction framework that leverages speech enhancers to provide a parallel enhanced-signal pathway, enabling robust predictions without reference signals. We evaluate three state-of-the-art enhancers and demonstrate that prediction performance depends on the choice of enhancer, with ensembles of strong enhancers yielding the best results. To improve cross-dataset generalization, we introduce a 2-clips augmentation strategy that enhances listener-specific variability, boosting robustness on unseen datasets. Our approach consistently outperforms the non-intrusive baseline, CPC2 Champion across multiple datasets, highlighting the potential of enhancer-guided non-intrusive intelligibility prediction for real-world applications.

artificial intelligence, intelligibility prediction, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.16979

Country: North America (0.28)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction

Zhou, Haoshuai, Mo, Changgeng, Cao, Boxuan, Li, Linkai, Wang, Shan Xiang

arXiv.org Artificial IntelligenceJun-4-2025

Personalized speech intelligibility prediction is challenging. Previous approaches have mainly relied on audiograms, which are inherently limited in accuracy as they only capture a listener's hearing threshold for pure tones. Rather than incorporating additional listener features, we propose a novel approach that leverages an individual's existing intelligibility data to predict their performance on new audio. We introduce the Support Sample-Based Intelligibility Prediction Network (SSIPNet), a deep learning model that leverages speech foundation models to build a high-dimensional representation of a listener's speech recognition ability from multiple support (audio, score) pairs, enabling accurate predictions for unseen audio. Results on the Clarity Prediction Challenge dataset show that, even with a small number of support (audio, score) pairs, our method outperforms audiogram-based predictions. Our work presents a new paradigm for personalized speech intelligibility prediction.

artificial intelligence, intelligibility score, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2506.02039

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area (0.97)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People

Zhou, Haoshuai, Cao, Boxuan, Mo, Changgeng, Li, Linkai, Wang, Shan Xiang

arXiv.org Artificial IntelligenceMay-14-2025

Speech foundation models (SFMs) have demonstrated strong performance across a variety of downstream tasks, including speech intelligibility prediction for hearing-impaired people (SIP-HI). However, optimizing SFMs for SIP-HI has been insufficiently explored. In this paper, we conduct a comprehensive study to identify key design factors affecting SIP-HI performance with 5 SFMs, focusing on encoder layer selection, prediction head architecture, and ensemble configurations. Our findings show that, contrary to traditional use-all-layers methods, selecting a single encoder layer yields better results. Additionally, temporal modeling is crucial for effective prediction heads. We also demonstrate that ensembling multiple SFMs improves performance, with stronger individual models providing greater benefit. Finally, we explore the relationship between key SFM attributes and their impact on SIP-HI performance. Our study offers practical insights into effectively adapting SFMs for speech intelligibility prediction for hearing-impaired populations.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.08215

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Transfer Learning from Whisper for Microscopic Intelligibility Prediction

Best, Paul, Cuervo, Santiago, Marxer, Ricard

arXiv.org Artificial IntelligenceApr-2-2024

Macroscopic intelligibility models predict the expected human word-error-rate for a given speech-in-noise stimulus. In contrast, microscopic intelligibility models aim to make fine-grained predictions about listeners' perception, e.g. predicting phonetic or lexical responses. State-of-the-art macroscopic models use transfer learning from large scale deep learning models for speech processing, whereas such methods have rarely been used for microscopic modeling. In this paper, we study the use of transfer learning from Whisper, a state-of-the-art deep learning model for automatic speech recognition, for microscopic intelligibility prediction at the level of lexical responses. Our method outperforms the considered baselines, even in a zero-shot setup, and yields a relative improvement of up to 66\% when fine-tuned to predict listeners' responses. Our results showcase the promise of large scale deep learning based methods for microscopic intelligibility prediction.

intelligibility prediction, listener, prediction, (14 more...)

arXiv.org Artificial Intelligence

2404.01737

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Germany > Berlin (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Speech foundation models on intelligibility prediction for hearing-impaired listeners

Cuervo, Santiago, Marxer, Ricard

arXiv.org Artificial IntelligenceJan-24-2024

Speech foundation models (SFMs) have been benchmarked on many speech processing tasks, often achieving state-of-the-art performance with minimal adaptation. However, the SFM paradigm has been significantly less explored for applications of interest to the speech perception community. In this paper we present a systematic evaluation of 10 SFMs on one such application: Speech intelligibility prediction. We focus on the non-intrusive setup of the Clarity Prediction Challenge 2 (CPC2), where the task is to predict the percentage of words correctly perceived by hearing-impaired listeners from speech-in-noise recordings. We propose a simple method that learns a lightweight specialized prediction head on top of frozen SFMs to approach the problem. Our results reveal statistically significant differences in performance across SFMs. Our method resulted in the winning submission in the CPC2, demonstrating its promise for speech perception applications.

backbone, intelligibility prediction, prediction, (14 more...)

arXiv.org Artificial Intelligence

2401.14289

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback